Coding problem statement:

The goal is to forecast the requested_qty amounts for the month of June and July in 2020. Currently the sell_in_qty and requested_qty for these two months are null.

In order to predict the requested quantity amount and sell in quantity amount, I am going to use Long Short-term Memory (LSTM) method which is a popular tool in Deep Learning. We are going to use Keras in order to implement LSTM.

First, we will start by importing required libraries.

Here, I am importing the given data from CSV file.

This is how our data looks like. From the above dataframe it is evident that our data is not sorted based on monthly date. As a time series forecasting prolem it is first rule of the forecasting that the data should be sorted in ascending order based on monthly date. So we sort our data by monthly date.

After sorting this is how our dataframe looks like.

Our dataframe has 6500+ observations of product data over the year of approximately 3 years (2017-2020).

Now let us look into the information about our dataframe. For this we use .info() function which prints information about a DataFrame including the index dtype and column dtypes, non-null values and memory usage.

Here, I am checking the shape of the dataframe.

Correlation Analysis

There is no strong correlation visible between any columns.

Our goal here is to forecast monthly sales in quantity and monthly requested quantity. So we need to aggregate our data at the monthly level and sum up the sale in quantity and requested quantity of the products.

Let us sum by the sale in quantity first.

Now let us sum up by the requested quantity.

So now our dataset is showing aggregated sale in quantity and requested quantity.

Now, checking our data for stationarity.

In order to use time series forecasting models, we need to ensure that our time series data is stationary i.e constant mean, constant variance and constant covariance with time.

So let us plot it and see.

It is not stationary and have an increasing trend over the months. And towards june and july it is a flat line because the dataset has no values there.

Also, for the requested quantity it is not stationary and have an increasing trends over the months. And again towards june and july it is a flat line.

Now let us plot monthly sales and requested quantity overlaying ech other.

1. Sale in quantity

Now let us work with the sale in quantity dataframe first. We start by making the data stationary first.

We can make our data stationary using the difference method. For this we need to get the difference in sell in quatity compared to the previous month and build the model on it.

Now we have the dataframe for modelling the difference. Let us check for the stationarity by plotting the data.

Feature engineering

Now the data is stationary. So we can start developing our feature set now. For this we will need previous monthly sales data to predict the next ones. The look-back period will be 12 for our data.

Now our features are ready. Let us check how good the features explain the variation in the lag(1,12). This can be measured using Adjusted R-squared. The Adjusted R-squared explains the variation in column diff. In order to calculate the Adjusted R-Squared we basically fit a linear regression model and then calculate the adjusted R-squared.

The score is 33% which is slightly towards moderate value.

Using MinMaxScaler here to scale each future between -1 and 1 here.

Now splitting the dataset into train and test set. We will train our model on train set and then use the test set to predict values for sales quantity from February to July.

Now creating feature and label sets from scaled datasets.

Now let us fit our LSTM model.

Now let us do the prediction on test set and see the results.

Since these are scaled data we see the difference we need to see the actual sales prediction.

Now we need to build the dataframe with dates and predictions. Transformed predictions shows some difference. Let us calculate the predicted sales in quantity.

So the monthly sales in quantity for June month is 530704.0 and for July month is 126409.0

Let us the plot these numbers and check how did our model perform.

2. Requested quantity

Now let us work with the requested quantity dataframe. We start by making the data stationary first.

We can make our data stationary using the difference method. For this we need to get the difference in sell in quatity compared to the previous month and build the model on it.

Now we have the dataframe for modelling the difference. Let us check for the stationarity by plotting the data.

Feature engineering

Now the data is stationary. So we can start developing our feature set now. For this we will need previous requested data to predict the next ones. The look-back period will be 12 for our data.

Now our features are ready. Let us check how good the features explain the variation in the lag(1,12). This can be measured using Adjusted R-squared. The Adjusted R-squared explains the variation in column diff. In order to calculate the Adjusted R-Squared we basically fit a linear regression model and then calculate the adjusted R-squared.

Using MinMaxScaler here to scale each future between -1 and 1 here.

Now splitting the dataset into train and test set.

Now creating feature and label sets from scaled datasets.

Now let us fit our LSTM model.

Now let us do the prediction on test set and see the results.

Since these are scaled data we see the difference we need to see the actual sales prediction.

Now we need to build the dataframe with dates and predictions. Transformed predictions shows some difference. Let us calculate the predicted sales in quantity.

So the monthly requested quantity for June month is 616150.0 and for July month is 293471.0.

Let us the plot these numbers and check how did our model perform.